Simulations
OneMap 3.0 also makes a interface with PedigreeSim software to perform simulations. There are three different ways to simulate maps in OneMap:
- Converting PedigreeSim output to OneMap raw data: function
pedsim2raw
This function just convert the PedigreeSim files to OneMap raw data. The only source of errors are inputated missing data controled by miss.perc argument.
- Converting PedigreeSim output to VCF file simulating allele depth with different distributions: function
pedsim2vcf
Simulating the VCF you can include other sources of errors. The map is simulated in PedigreeSim, and according with each genotype, the choosed statistical distribuition will simulated the reference and alternative alleles counts. The genotypes can be reestimated according with the alleles counts simulated using the genotype callers Updog (function updog_error), polyRAD (function polyRAD_error) and Supermassa (function supermassa_error of supermassa4onemap package.
- Use a chromossome of the reference genome to simulate SNPs, the bi-parental cross and Illumina reads for each individual: workflow
SimulateReads.wdl
The workflow SimulatedReads is available in onemap_workflows repository in github. It performs the simulation of Illumina reads for outcrossing population, the SNP calling in Freebayes and GATK software, the genotype calling in updog, supermassa and polyRAD, and build genetics maps using every combination in OneMap and gusmap. The workflow is wroten in WDL cromwell worflow system and the parameters of every software implented can be easily changed. All the results can be evaluated in a shiny app. See the workflow tutorial to perfom this analysis.
Run PedigreeSim
First, download PedigreeSim java file. It will require java installed. You can run PedigreeSim directly and just use the output files in OneMap or you can use a function named run_pedsim that facilitates this procedure.
The function do not provide every possibilities offered by PedigreeSim software. If you want to change any parameter that is not available in the function, please use directly the PedigreeSim software. Here is the software documentation for more information.
# For outcrossing population
run_pedsim(chromosome = "Chr1", n.marker = 54, tot.size.cm = 100, centromere = 50,
n.ind = 200, mk.types = c("A1", "A2", "A3", "A4", "B1.5", "B2.6", "B3.7",
"C.8", "D1.9", "D1.10", "D1.11", "D1.12", "D1.13",
"D2.14", "D2.15", "D2.16", "D2.17", "D2.18"),
n.types = rep(3,18), pop = "F1", path.pedsim = "/home/rstudio/onemap_ht2/",
name.mapfile = "mapfile.map", name.founderfile="founderfile.gen",
name.chromfile="sim.chrom", name.parfile="sim.par",
name.out="sim_out")
# For F2 population
run_pedsim(chromosome = c("Chr1", "Chr2"), n.marker = c(75, 75),
tot.size.cm = c(100,100), centromere = c(50,50),
n.ind = 200, mk.types = c("A.H.B","C.A", "D.B"),
n.types = rep(50,3), pop = "F2", path.pedsim = "/home/rstudio/onemap_ht2/",
name.mapfile = "mapfile_f2.map", name.founderfile="founderfile_f2.gen",
name.chromfile="sim_f2.chrom", name.parfile="sim_f2.par",
name.out="sim_f2")The function allows to create f2 intercross and backcross populations from bi-parental cross of inbred lines and segregating F1 population from bi-parental cross of heterozygous parents. You can define it in pop argument. You must change the path.pedsim to the path where the PedigreeSim.jar is stores in your system. You can also define the number of chromosomes (argument chromosome), the number of markers in each chromosome (n.marker), the total size of the groups in cM (tot.size.cm), the position of centromere (centromere), number of individuals in the population (n.ind), the marker types (mk.types, see the table in session Creating the data file of Outcrossing populations vignette) and the number of markers of each type (n.types).
We suggest you to open the output files founderfile, chromfile, mapfile and parfile to check if they agree with your intentions before proceed to other analysis.
Simulate OneMap raw data
Once you run PedigreeSim, you should have the output file genotypes.dat. To convert this to OneMap raw file, you just need to specify the cross type (cross), which ones are the parents (parent1 and parent2) and if you want to include missing genotypes (miss.perc). Only cross types outcross and f2 intercross are supported by now.
# For outcrossing population
pedsim2raw(genofile = "sim_out_genotypes.dat", cross="outcross", parent1 = "P1", parent2 = "P2", out.file = "sim_out.example.raw", miss.perc = 0)
onemap.obj <- read_onemap("sim_out.example.raw")## Working...
##
## --Read the following data:
## Type of cross: outcross
## Number of individuals: 200
## Number of markers: 54
## Chromosome information: no
## Position information: no
## Number of traits: 0
# For F2 population
pedsim2raw(genofile = "sim_f2_genotypes.dat", cross="f2 intercross", parent1 = "P1", parent2 = "P2", out.file = "sim_f2.example.raw", miss.perc = 0)
onemap.obj <- read_onemap("sim_f2.example.raw")## Working...
##
## --Read the following data:
## Type of cross: f2
## Number of individuals: 200
## Number of markers: 150
## Chromosome information: no
## Position information: no
## Number of traits: 0
Simulate VCF file
The same output file from PedigreeSim, the genotypes.dat can be used to simulate a VCF file together with the PedigreeSim mapfile and chrom. The advantages to simulate a VCF instead of onemap raw file is that VCF is a standard file format and can store a lot of other information included the allele counts, usually in the field AD or DPR. The pedsim2vcf function can simulate the allele counts using negative binomial or updog distribuitions (argument method). The main paramenters for the distributions are defined with arguments mean.depth, that defines the mean allele depth in the progeny, p.mean.depth that defines the mean allele depth in the parents, argument disper.par defines the dispersion parameter, mean.phred defines the mean phred score of the sequencing technology used. The function can also simulate missing data (miss.perc). Throught argument pos and chr you can define vectors with phisical position and chromosome of each marker. With argument haplo.ref you define which one of the haplotypes in genotypes.dat will the considered the reference. Stablishing phase as TRUE, the VCF will have phased genotypes. After allele counts are simulated, the genotypes are reestimated using a binomial distribution. The VCF generated by this function only have one or two FORMAT fields, the GT and AD (if counts = TRUE). Dominant markers are not supported by this function.
# Dominant markers are not supported then, simulate other dataset with only codominant markers
run_pedsim(chromosome = "Chr1", n.marker = 42, tot.size.cm = 100, centromere = 50,
n.ind = 200, mk.types = c("A1", "A2", "B3.7", "D1.9", "D1.10", "D2.14", "D2.15"),
n.types = rep(6,7), pop = "F1", path.pedsim = "/home/rstudio/onemap_ht2/",
name.mapfile = "mapfile.map", name.founderfile="founderfile.gen",
name.chromfile="sim.chrom", name.parfile="sim.par",
name.out="sim_out")
# For outcrossing population
pedsim2vcf(inputfile = "sim_out_genotypes.dat",
map.file = "mapfile.map",
chrom.file = "sim.chrom",
out.file = "simu_out.vcf",
miss.perc = 0,
counts = TRUE,
mean.depth = 100,
p.mean.depth = 100,
chr.mb = 10,
method = "updog",
mean.phred = 20,
bias=1,
od=0.00001,
pos=NULL,
chr=NULL,
phase = FALSE,
disper.par=2)## Counts simulation changed 0.1885903 % of the given genotypes
## Scanning file to determine attributes.
## File attributes:
## meta lines: 4
## header_line: 5
## variant count: 42
## column count: 211
##
Meta line 4 read in.
## All meta lines processed.
## gt matrix initialized.
## Character matrix gt created.
## Character matrix gt rows: 42
## Character matrix gt cols: 211
## skip: 0
## nrows: 42
## row_num: 0
##
Processed variant: 42
## All variants processed
onemap_out.obj <- onemap_read_vcfR(vcfR.object = vcfR_out.obj,
cross = "outcross", parent1 = "P1", parent2 = "P2")
plot(onemap_out.obj, all=F)# For F2 population
# Dominant markers are not supported by this function
run_pedsim(chromosome = c("Chr1", "Chr2"), n.marker = c(75, 75),
tot.size.cm = c(100,100), centromere = c(50,50),
n.ind = 200, mk.types = c("A.H.B"),
n.types = 150, pop = "F2", path.pedsim = "/home/rstudio/onemap_ht2/",
name.mapfile = "mapfile_f2.map", name.founderfile="founderfile_f2.gen",
name.chromfile="sim_f2.chrom", name.parfile="sim_f2.par",
name.out="sim_f2")
pedsim2vcf(inputfile = "sim_f2_genotypes.dat",
map.file = "mapfile_f2.map",
chrom.file = "sim_f2.chrom",
out.file = "simu_f2.vcf",
miss.perc = 0,
counts = TRUE,
mean.depth = 100,
p.mean.depth = 100,
chr.mb = 10,
method = "updog",
mean.phred = 20,
bias=1,
od=0.001,
pos=NULL,
chr=NULL,
phase = FALSE,
disper.par=2)## Counts simulation changed 0.1444992 % of the given genotypes
## Scanning file to determine attributes.
## File attributes:
## meta lines: 4
## header_line: 5
## variant count: 150
## column count: 212
##
Meta line 4 read in.
## All meta lines processed.
## gt matrix initialized.
## Character matrix gt created.
## Character matrix gt rows: 150
## Character matrix gt cols: 212
## skip: 0
## nrows: 150
## row_num: 0
##
Processed variant: 150
## All variants processed
## 1 Markers were removed from the dataset because one or both of the parents are heterozygotes, we do not expect heterozygotes parents in F2 populations.
Graphical view of genotypes and allele counts
Function create_depth_profile generates dispertion graphics with x and y axis pointing respectively the reference and alternative allele counts. The function is only available for biallelic markers and for outcrossing and f2 intercross population. Each dot represent a genotype considering mks markers and inds individuals. If are stablished NULL for both arguments, all markers and individuals are considered. The color of the dots are according with the genotypes present in onemap object (GTfrom = onemap) or in VCF file (GTfrom = vcf) or the color can represent the error rate (1 - highest genotype probability) of each genotype in onemap object (GTfrom = prob). A rds file is generated with the data in the graphic (rds.file). The alpha argument control the transparency of color of each dot, regulate this parameter is a good idea when having big amount of markers and individuals.
# For outcrossing population
create_depths_profile(onemap.obj = onemap_out.obj,
vcfR.object = vcfR_out.obj,
parent1 = "P1",
parent2 = "P2",
vcf.par = "AD",
recovering = FALSE,
mks = NULL,
inds = NULL,
GTfrom = "vcf",
alpha = 0.1,
rds.file = "depths_out.rds")## Summary of reference counts:
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1.00 28.00 52.00 67.31 90.00 450.00
## Summary of alternative counts:
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1.0 12.0 34.0 43.6 64.0 293.0
# For f2 intercross population
create_depths_profile(onemap.obj = onemap_f2.obj,
vcfR.object = vcfR_f2.obj,
parent1 = "P1",
parent2 = "P2",
f1 = "F1",
vcf.par = "AD",
recovering = FALSE,
mks = NULL,
inds = NULL,
GTfrom = "vcf",
alpha = 0.1,
rds.file = "depths_f2.rds")## Summary of reference counts:
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1.00 28.00 51.00 66.38 87.00 650.00
## Summary of alternative counts:
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1.00 9.00 33.00 42.95 63.00 448.00
Reestimate genotypes
If you choose to simulate allele counts in pedsim2vcf it is also possible to reestimate the genotypes using this simulated allele counts with other distributions. This will include in your data errors comming from random sampling if considering only poisson or negative binomial distribuitions and/or dispersion, outliers and bias errors if using updog model. Other advantage is that reestimating genotypes with any of this softwares you can obtain genotypes probabilities to be used in the map building instead of only genotypes.
warning: The reestimation of genotypes is only performed in biallelic codominant markers. If you have multiallelic markers in your VCF, they will be keept with the same genotype.
Updog, polyRAD and supermassa are software designed for genotyping polyploid species. This is a more complex procedure compared with diploid species. These software consider not only the proportion of alleles to define the genotypes, but other aspects as the expected distribuition in the progeny according with parents genotypes. See more about them in respective manuals.
Genotypes probabilities usage
OneMap 3.0 have three options to define error probability in HMM emission phase. The default method is consider a global error rate for every genotype \(10^{-5}\). Until this present version, only this procedure was implemented. Now, with the function create_probs, we offer the option to change the global error rate (global_error argument), or consider a error rate by genotype (genotypes_errors) or a genotype probability for each genotype (genotypes_probs).
rodar tres familias com probs, errors e global erro de 0.05 para ver qual é melhor - verificar qual o metodo utilizado nas cinco familias q ja foram rodadas, adicionar uma simulação variando o erro global para verificar diferença nos haplotipos depois
With updog
The function updog_genotype make interface of OneMap with Updog to perform the genotype calling.
onemap_geno.updog <- updog_genotype(vcfR.object=vcfR_out.obj,
onemap.object= onemap_out.obj,
vcf.par = "AD",
parent1="P1",
parent2="P2",
f1=NULL,
recovering = FALSE,
mean_phred = 20,
cores = 4,
depths = NULL,
global_error = NULL,
use_genotypes_errors = TRUE,
use_genotypes_probs = FALSE)## This approach changed 29 % of the genotypes from biallelic markers
## New onemap object contains 18 biallelic markers
create_depths_profile(onemap.obj = onemap_geno.updog,
vcfR.object = vcfR_out.obj,
parent1 = "P1",
parent2 = "P2",
vcf.par = "AD",
recovering = FALSE,
GTfrom = "vcf") ## Summary of reference counts:
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1.00 28.00 52.00 67.31 90.00 450.00
## Summary of alternative counts:
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1.0 12.0 34.0 43.6 64.0 293.0
With polyRAD
The function polyRAD_genotype make interface of OneMap with polyRAD to perform the genotype calling.
onemap_geno.polyRAD <- polyRAD_genotype(vcf="simu_out.vcf",
onemap.obj = onemap_out.obj,
parent1="P1",
parent2="P2",
f1="F1",
crosstype="outcross",
global_error = NULL,
use_genotypes_errors = TRUE,
use_genotypes_probs = FALSE,
rm_multiallelic = F)## This approach changed 30.30556 % of the genotypes
create_depths_profile(onemap.obj = onemap_geno.polyRAD,
vcfR.object = vcfR_out.obj,
parent1 = "P1",
parent2 = "P2",
vcf.par = "AD",
recovering = FALSE,
GTfrom = "vcf") ## Summary of reference counts:
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1.00 28.00 52.00 67.31 90.00 450.00
## Summary of alternative counts:
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1.0 12.0 34.0 43.6 64.0 293.0
With supermassa
The function supermassa_genotype make interface of OneMap with supermassa to perform the genotype calling.
library(supermassa4onemap)
onemap_geno.supermassa <- supermassa_genotype(vcfR.object=vcfR_out.obj,
onemap.object= onemap_out.obj,
vcf.par = "AD",
parent1="P1",
parent2="P2",
f1=NULL,
recovering = FALSE,
mean_phred = 20,
cores = 4,
depths = NULL,
global_error = NULL,
use_genotypes_errors = FALSE,
use_genotypes_probs = TRUE,
rm_multiallelic = F)## 27.94444 % of the genotypes were changed by this approach
## 0 were recovered from vcf
## 0 markers from old onemap object were considered non-informative and removed of analysis
create_depths_profile(onemap.obj = onemap_geno.supermassa,
vcfR.object = vcfR_out.obj,
parent1 = "P1",
parent2 = "P2",
vcf.par = "AD",
recovering = FALSE,
GTfrom = "vcf") ## Summary of reference counts:
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1.00 28.00 52.00 67.31 90.00 450.00
## Summary of alternative counts:
## Min. 1st Qu. Median Mean 3rd Qu. Max.
## 1.0 12.0 34.0 43.6 64.0 293.0
Example
Here we want to compare the best map using updog counts simulations and updog, polyrad and supermassa genotyping and genotypes probabilities. As a simple example, we will simulate only one family and one chromosome.
for(depth in c(100, 50, 10)){
run_pedsim(chromosome = "Chr1", n.marker = 42, tot.size.cm = 100, centromere = 50,
n.ind = 200, mk.types = c("A1", "A2", "B3.7", "D1.9", "D1.10", "D2.14", "D2.15"),
n.types = rep(6,7), pop = "F1", path.pedsim = "/home/rstudio/onemap_ht2/",
name.mapfile = "mapfile.map", name.founderfile="founderfile.gen",
name.chromfile="sim.chrom", name.parfile="sim.par",
name.out="sim_out")
# VCF with unmodified genotypes
pedsim2vcf(inputfile = "sim_out_genotypes.dat",
map.file = "mapfile.map",
chrom.file = "sim.chrom",
out.file = "simu_out.vcf",
miss.perc = 0,
counts = FALSE)
vcfR_out.obj <- read.vcfR("simu_out.vcf")
onemap_out.obj <- onemap_read_vcfR(vcfR.object = vcfR_out.obj, cross = "outcross", parent1 = "P1", parent2 = "P2")
twopts <- rf_2pts(onemap_out.obj)
seq_ord <- make_seq(twopts, order(as.numeric(onemap_out.obj$POS)))
map_real <- map(seq_ord, phase_cores = 4)
p <- rf_graph_table(map_real, main = paste0("Real_",depth))
ggsave(p, filename = paste0("real", depth,".jpeg"))
# Simulating counts
pedsim2vcf(inputfile = "sim_out_genotypes.dat",
map.file = "mapfile.map",
chrom.file = "sim.chrom",
out.file = "simu_out.vcf",
miss.perc = 0,
counts = TRUE,
mean.depth = depth,
p.mean.depth = depth,
chr.mb = 10,
method = "updog",
mean.phred = 20,
bias=0.8,
od=0.0001,
pos=NULL,
chr=NULL,
phase = FALSE,
disper.par=2)
vcfR_out.obj <- read.vcfR("simu_out.vcf")
onemap_out.obj <- onemap_read_vcfR(vcfR.object = vcfR_out.obj, cross = "outcross", parent1 = "P1", parent2 = "P2")
onemap_geno.updog <- updog_genotype(vcfR.object=vcfR_out.obj,
onemap.object= onemap_out.obj,
vcf.par = "AD",
parent1="P1",
parent2="P2",
f1=NULL,
recovering = TRUE,
mean_phred = 20,
cores = 4,
depths = NULL,
global_error = NULL,
use_genotypes_errors = FALSE,
use_genotypes_probs = TRUE,
rm_multiallelic = F)
twopts <- rf_2pts(onemap_geno.updog)
seq_ord <- make_seq(twopts, order(as.numeric(onemap_geno.updog$POS)))
map_updog <- map(input.seq = seq_ord, phase_cores = 4, rm_unlinked = T) # If HMM find problems between two markers, one of them will be automatically discarted and the sequence of markers without it will be returned
while(is(map_updog, "vector")){ # if the result is a sequence of marker numbers, then HMM is run again
# This will be repeteated until HMM can run with no problems
seq_temp <- make_seq(twopts, map_updog)
map_updog <- map(input.seq = seq_temp, phase_cores = 4, rm_unlinked = T)
}
p <- rf_graph_table(map_updog, main = paste0("Updog depth",depth))
ggsave(p, filename = paste0("updog", depth,".jpeg"))
onemap_geno.polyRAD <- polyRAD_genotype(vcf="simu_out.vcf",
onemap.obj = onemap_out.obj,
parent1="P1",
parent2="P2",
f1="F1",
crosstype="outcross",
global_error = NULL,
use_genotypes_errors = FALSE,
use_genotypes_probs = TRUE,
rm_multiallelic = F)
twopts <- rf_2pts(onemap_geno.polyRAD)
seq_ord <- make_seq(twopts, order(as.numeric(onemap_geno.polyRAD$POS)))
map_polyRAD <- map(input.seq = seq_ord, phase_cores = 4, rm_unlinked = T)
while(is(map_polyRAD, "vector")){
seq_temp <- make_seq(twopts, map_polyRAD)
map_polyRAD <- map(input.seq = seq_temp, phase_cores = 4, rm_unlinked = T)
}
p <- rf_graph_table(map_polyRAD, main = paste0("polyRAD depth", depth))
ggsave(p, filename = paste0("polyrad", depth,".jpeg"))
onemap_geno.supermassa <- supermassa_genotype(vcfR.object=vcfR_out.obj,
onemap.object= onemap_out.obj,
vcf.par = "AD",
parent1="P1",
parent2="P2",
f1=NULL,
recovering = FALSE,
mean_phred = 20,
cores = 4,
depths = NULL,
global_error = NULL,
use_genotypes_errors = FALSE,
use_genotypes_probs = TRUE,
rm_multiallelic = F)
twopts <- rf_2pts(onemap_geno.supermassa)
seq_ord <- make_seq(twopts, order(as.numeric(onemap_geno.supermassa$POS)))
map_supermassa <- map(input.seq = seq_ord, phase_cores = 4, rm_unlinked = T)
while(is(map_supermassa, "vector")){
seq_temp <- make_seq(twopts, map_supermassa)
map_supermassa <- map(input.seq = seq_temp, phase_cores = 4, rm_unlinked = T)
}
p <- rf_graph_table(map_supermassa, main = paste0("supermassa depth", depth))
ggsave(p, filename = paste0("supermassa", depth,".jpeg"))
onemap_0.05 <- create_probs(onemap_geno.updog, global_error = 0.05)
twopts <- rf_2pts(onemap_0.05)
seq_ord <- make_seq(twopts, order(as.numeric(onemap_0.05$POS)))
map_0.05 <- map(input.seq = seq_ord, phase_cores = 4, rm_unlinked = T)
while(is(map_0.05, "vector")){
seq_temp <- make_seq(twopts, map_0.05)
map_0.05 <- map(input.seq = seq_temp, phase_cores = 4, rm_unlinked = T)
}
p <- rf_graph_table(map_0.05, main = paste0("0.05", depth))
ggsave(p, filename = paste0("0.05", depth,".jpeg"))
draw_map2(map_real, map_updog, map_polyRAD, map_supermassa, map_0.05, main = depth,
group.names = c("Real", "UD", "PR", "SM", "0.05"),
output = paste0(depth,"_maps.jpeg"))
}## Scanning file to determine attributes.
## File attributes:
## meta lines: 4
## header_line: 5
## variant count: 42
## column count: 211
##
Meta line 4 read in.
## All meta lines processed.
## gt matrix initialized.
## Character matrix gt created.
## Character matrix gt rows: 42
## Character matrix gt cols: 211
## skip: 0
## nrows: 42
## row_num: 0
##
Processed variant: 42
## All variants processed
## Computing 861 recombination fractions ...
## Counts simulation changed 0.3182461 % of the given genotypes
## Scanning file to determine attributes.
## File attributes:
## meta lines: 4
## header_line: 5
## variant count: 42
## column count: 211
##
Meta line 4 read in.
## All meta lines processed.
## gt matrix initialized.
## Character matrix gt created.
## Character matrix gt rows: 42
## Character matrix gt cols: 211
## skip: 0
## nrows: 42
## row_num: 0
##
Processed variant: 42
## All variants processed
## This approach changed 18.66667 % of the genotypes from biallelic markers
## New onemap object contains 18 biallelic markers
## Computing 861 recombination fractions ...
## This approach changed 19.16667 % of the genotypes
## Computing 861 recombination fractions ...
## 16.88889 % of the genotypes were changed by this approach
## 0 were recovered from vcf
## 0 markers from old onemap object were considered non-informative and removed of analysis
## Computing 861 recombination fractions ...
## Computing 861 recombination fractions ...
## Completed
## Output file: /home/rstudio/onemap_ht2/onemap_ht/vignettes_highres/100_maps.jpeg
## Scanning file to determine attributes.
## File attributes:
## meta lines: 4
## header_line: 5
## variant count: 42
## column count: 211
##
Meta line 4 read in.
## All meta lines processed.
## gt matrix initialized.
## Character matrix gt created.
## Character matrix gt rows: 42
## Character matrix gt cols: 211
## skip: 0
## nrows: 42
## row_num: 0
##
Processed variant: 42
## All variants processed
## Computing 861 recombination fractions ...
## Counts simulation changed 1.02546 % of the given genotypes
## Scanning file to determine attributes.
## File attributes:
## meta lines: 4
## header_line: 5
## variant count: 42
## column count: 211
##
Meta line 4 read in.
## All meta lines processed.
## gt matrix initialized.
## Character matrix gt created.
## Character matrix gt rows: 42
## Character matrix gt cols: 211
## skip: 0
## nrows: 42
## row_num: 0
##
Processed variant: 42
## All variants processed
## 1 Markers were removed from the dataset because both of parents are homozygotes, these markers are considered non-informative in outcrossing populations.
## This approach changed 17.72222 % of the genotypes from biallelic markers
## New onemap object contains 18 biallelic markers
## Computing 820 recombination fractions ...
## This approach changed 17.57895 % of the genotypes
## Computing 820 recombination fractions ...
## 17.55556 % of the genotypes were changed by this approach
## 0 were recovered from vcf
## 0 markers from old onemap object were considered non-informative and removed of analysis
## Computing 820 recombination fractions ...
## Computing 820 recombination fractions ...
## Completed
## Output file: /home/rstudio/onemap_ht2/onemap_ht/vignettes_highres/50_maps.jpeg
## Scanning file to determine attributes.
## File attributes:
## meta lines: 4
## header_line: 5
## variant count: 42
## column count: 211
##
Meta line 4 read in.
## All meta lines processed.
## gt matrix initialized.
## Character matrix gt created.
## Character matrix gt rows: 42
## Character matrix gt cols: 211
## skip: 0
## nrows: 42
## row_num: 0
##
Processed variant: 42
## All variants processed
## Computing 861 recombination fractions ...
## Counts simulation changed 9.759547 % of the given genotypes
## Scanning file to determine attributes.
## File attributes:
## meta lines: 4
## header_line: 5
## variant count: 42
## column count: 211
##
Meta line 4 read in.
## All meta lines processed.
## gt matrix initialized.
## Character matrix gt created.
## Character matrix gt rows: 42
## Character matrix gt cols: 211
## skip: 0
## nrows: 42
## row_num: 0
##
Processed variant: 42
## All variants processed
## 3 Markers were removed from the dataset because both of parents are homozygotes, these markers are considered non-informative in outcrossing populations.
## This approach changed 21.0625 % of the genotypes from biallelic markers
## New onemap object contains 18 biallelic markers
## Computing 820 recombination fractions ...
## This approach changed 23.75 % of the genotypes
## Computing 741 recombination fractions ...
## 23.375 % of the genotypes were changed by this approach
## 0 were recovered from vcf
## 0 markers from old onemap object were considered non-informative and removed of analysis
## Computing 741 recombination fractions ...
## Computing 820 recombination fractions ...
## Completed
## Output file: /home/rstudio/onemap_ht2/onemap_ht/vignettes_highres/10_maps.jpeg
Simulate Illumina reads
This one is a bit more complex and the its tools is stored in the github repository onemap_workflows. Please, access it for more information.